Guard SDK Library

dynamofl.guard methods enhance safety and guardrails around models. This page divides the SDK methods into the following sections:

alignment functions that assist in aligning models using preference data such as DPO and RLHF.
helper functions that assist in a variety of safety and guardrail use cases.

Installation

Please follow the installation steps in the page titled "Installing ML SDK Libraries."

from dynamofl.guard import *

Release Notes

Please see release notes for dynamofl.guard in the release notes section in the documentation sidebar.

Alignment Methods

Method `generate_prompts_relevant_to_policy()`

Returns
List of prompts suitable to test or train LLMs on a specified custom alignment policy. Writes prompts using both diverse and in-domain algorithmic processes.

Parameters

policy(str): Alignment guideline to adhere to.
domain(str): Domain of the language model / topics it should answer.
- E.g. "General chatbot for finance customers."
example_prompts (list[str]): list of example prompts
num_to_generate(int, default=8): Number of prompts to generate.
- Maximum 2048.
diverse_ratio(float, default=0.5): Ratio of prompts that are diverse
- default = 0.5; 50% of prompts will be diverse prompts; 50% will be in domain.
example_subtopics(list[str], default=[], optional): list of example subtopics used to generate in domain prompts.
- Note: this is optional. By default, an input is an empty list and the algorithm will automatically generate subtopics (which it will print out for review if verbose is set to True).
verbose(bool, default=False): Print statistics.

Method `formulate_critic_revision_prompts()`

Returns
A tuple (critic_prompt, revision_prompt):

critic_prompt is a string used to prompt for critiquing a response. e.g. Discuss whether the response contained anything that may be interpreted as offensive or harmful. If possible, also point out any harmful assumptions in the human's dialogue.
revision_prompt is a string used to prompt for revising a response. e.g. Please write a new version of the response that removes offensive and harmful content, and politely addresses any problematic assumptions from the human.

Parameters

policy(str): Alignment guideline to adhere to.
verbose(bool, default=False): Print statistics.

Method `write_better_responses_critique()`

This improves base responses for compliance with a policy.
Returns
Responses to prompts as a list of dictionaries:

[
    {"prompt":, "rejected":, "chosen":, "rejected_critique":,}
]

This is Dynamo's adaptive CAI approach.

The base responses are set as the "rejected" response.
An LLM is asked to critique the base response.
An LLM is asked to generate a better "chosen" response based on the critique.

Parameters:

policy(str): Alignment guideline to adhere to.
critic_prompt(str): Prompt used by automated red-teaming model to judge responses.
revision_prompt(str): Prompt used by automated red-teaming model to improve upon responses.
prompts_responses_list(list[dict]) must be a list of dicts in the following format:

[
    {"prompt":, "response":}
]

filter_good_responses(bool, default=False): throw away datapoints where the model is already doing well
- default: False
- If True, we first critique the "responses" of prompts_responses_list and throw it out if it's already rated as the top score.
verbose(bool, default=False): Print statistics.

Helper Methods

Method `generate_data_api()`

Returns
JSON output from LLM APIs. NOTE: If your input is a single string, this will return a string. If your input is a list of strings, this will return a list of strings. Note: by default, enforce_output_key expects the LLM output to be in the format of

{
    "generated": <output>
}

This method will return the <output>. It will ignore leading and trailing text before and after the curly braces.

Use environ vars to select the model and input your API key / endpoint.

# 1. Specify your API model
    Options: ["mistral-tiny", "mistral-small-latest", "mistral-medium-latest", "mistral-large-latest",
    "gpt-4", "gpt-3.5-turbo", "claude-3-opus-20240229", "claude-3-sonnet-20240229",
    "custom"]
Note: "custom" refers to any callable model API endpoint e.g. Databricks' serving of Mistral.
os.environ['DYNAMO_DATA_GENERATION_MODEL'] = "custom"
# 2. Specify your API key
os.environ['DYNAMO_DATA_GENERATION_API_KEY'] = 'dapi[EXAMPLE]d0'      # databricks
# 3. If DYNAMO_DATA_GENERATION_MODEL == 'custom', then you must specify endpoint and api key
os.environ['DYNAMO_DATA_GENERATION_ENDPOINT'] = 'https://dbc-[EXAMPLE].cloud.databricks.com/serving-endpoints/databricks-mixtral-8x7b-instruct/invocations'

Parameters

prompt(Union[str, List[str]]): input to mistral. Prompt must tell model to output a JSON with one key (and one key only): "generated" If prompt is a list, then the model will generate a list of outputs. Else, the model will output a single string.
temperature(float): used during generation; must be between 0.0 and 1.0
model(str, default='mistral-small-latest'): model to call. mistral-medium-latest is the better model, equivalent to or better than GPT3.5 mistral-small-latest is Mixtral-8x7b, which is faster than mistral-medium-latest gpt-4 will call the latest default gpt-4 model on OpenAI's api
verbose(bool, default=False): print the output
endpoint(str, default=None): endpoint used for any custom API via requests
api_key(str, default=None): Mistral API key By default, we are using DynamoFL org's api token designed for external demos.
required_characters(list[str], default=[]): list of characters to check in the output For example, passing in ['[', ']'] will raise a custom error if the output does not contain both '[' and ']'. It will prompt the model that it is missing this character and ask the model to fix its output.
enforce_dictionary(bool, default=True): enforce model output to be a dictionary Cuts off all characters before the first ' and after the last '. Raises error and reprompts model if output does not contain a valid dictionary.
enforce_output_key(str, default="generated): enforce JSON output to have a certain key Automatically returns the value of this key in the JSON. If None, then return the entire JSON.
max_tokens(int, default=512): max number of tokens to generate. Note: mistral API does not support max tokens at the moment. Note: OAI has a maximum of 4000 for max_tokens + len(prompt).

Method `find_similar_strings()`

Returns
List of indices. Each index represents the element of a string that is > similarity_threshold of another string in the list.

Note: uses an LLM to generate embeddings and cosine similarity.
Note: uses sentence_transformers Parameters
string_list(list[str]): list of strings to compare for similarity
similarity_threshold(float, default=0.75): threshold for cosine similarity
model_id(str, default='all-MiniLM-L12-v2'): model to use to compute embeddings
- Options: all-MiniLM-L12-v2, all-MiniLM-L6-v2
verbose(bool, default=False): print example of sentences that were similar

Method `jsonl_to_csv()`

Returns
Writes contents of jsonl into a csv file. Parameters

jsonl_file_path(str): jsonl location to copy contents from.
csv_file_path(str): csv location to paste contents into.

Method `csv_to_jsonl()`

Returns
Writes contents of csv into a jsonl file. Parameters

csv_file_path(str): csv location to copy contents from.
jsonl_file_path(str): jsonl location to paste contents into.

Installation​

Release Notes​

Alignment Methods​

Method generate_prompts_relevant_to_policy()​

Method formulate_critic_revision_prompts()​

Method write_better_responses_critique()​

Helper Methods​

Method generate_data_api()​

Method find_similar_strings()​

Method jsonl_to_csv()​

Method csv_to_jsonl()​

Installation

Release Notes

Alignment Methods

Method `generate_prompts_relevant_to_policy()`

Method `formulate_critic_revision_prompts()`

Method `write_better_responses_critique()`

Helper Methods

Method `generate_data_api()`

Method `find_similar_strings()`

Method `jsonl_to_csv()`

Method `csv_to_jsonl()`